Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher.
Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?
Some links on this page may take you to non-federal websites. Their policies may differ from this site.
-
The problem of estimating the size of a population based on a subset of individuals observed across multiple data sources is often referred to as capture-recapture or multiple-systems estimation. This is fundamentally a missing data problem, where the number of unobserved individuals represents the missing data. As with any missing data problem, multiple-systems estimation requires users to make an untestable identifying assumption in order to estimate the population size from the observed data. If an appropriate identifying assumption cannot be found for a data set, no estimate of the population size should be produced based on that data set, as models with different identifying assumptions can produce arbitrarily different population size estimates—even with identical observed data fits. Approaches to multiple-systems estimation often do not explicitly specify identifying assumptions. This makes it difficult to decouple the specification of the model for the observed data from the identifying assumption and to provide justification for the identifying assumption. We present a re-framing of the multiple-systems estimation problem that leads to an approach which decouples the specification of the observed-data model from the identifying assumption, and discuss how common models fit into this framing. This approach takes advantage of existing software and facilitates various sensitivity analyses. We demonstrate our approach in a case study estimating the number of civilian casualties in the Kosovo war. Code used to produce this manuscript is available at github.com/aleshing/central-role-of-identifying-assumptionsmore » « less
-
Abstract Stochastic epidemic models (SEMs) fit to incidence data are critical to elucidating outbreak dynamics, shaping response strategies, and preparing for future epidemics. SEMs typically represent counts of individuals in discrete infection states using Markov jump processes (MJPs), but are computationally challenging as imperfect surveillance, lack of subject‐level information, and temporal coarseness of the data obscure the true epidemic. Analytic integration over the latent epidemic process is impossible, and integration via Markov chain Monte Carlo (MCMC) is cumbersome due to the dimensionality and discreteness of the latent state space. Simulation‐based computational approaches can address the intractability of the MJP likelihood, but are numerically fragile and prohibitively expensive for complex models. A linear noise approximation (LNA) that approximates the MJP transition density with a Gaussian density has been explored for analyzing prevalence data in large‐population settings, but requires modification for analyzing incidence counts without assuming that the data are normally distributed. We demonstrate how to reparameterize SEMs to appropriately analyze incidence data, and fold the LNA into a data augmentation MCMC framework that outperforms deterministic methods, statistically, and simulation‐based methods, computationally. Our framework is computationally robust when the model dynamics are complex and applies to a broad class of SEMs. We evaluate our method in simulations that reflect Ebola, influenza, and SARS‐CoV‐2 dynamics, and apply our method to national surveillance counts from the 2013–2015 West Africa Ebola outbreak.more » « less
An official website of the United States government

Full Text Available